8 research outputs found
Regularized Decomposition of High-Dimensional Multistage Stochastic Programs with Markov Uncertainty
We develop a quadratic regularization approach for the solution of
high-dimensional multistage stochastic optimization problems characterized by a
potentially large number of time periods/stages (e.g. hundreds), a
high-dimensional resource state variable, and a Markov information process. The
resulting algorithms are shown to converge to an optimal policy after a finite
number of iterations under mild technical assumptions. Computational
experiments are conducted using the setting of optimizing energy storage over a
large transmission grid, which motivates both the spatial and temporal
dimensions of our problem. Our numerical results indicate that the proposed
methods exhibit significantly faster convergence than their classical
counterparts, with greater gains observed for higher-dimensional problems
Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M) different positions. The aim is to maximize the cumulative reward with respect to the best possible subset and positions in hindsight. By the adaptation of a minimum-cost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability.With its potential to work with whole-page recommendation and any probabilistic models, to illustrate the effectiveness of our method, we focus on Gaussian process optimization and a contextual setting where click-through rate is predicted using logistic regression. We demonstrate the algorithms’ performance on synthetic Gaussian process problems and on large-scale news article recommendation datasets from Yahoo! Front Page Today Module